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Which end of the data egg gets broken first— big or little? 
Do you start with the MSB or the LSB? For the proposed 
IEEE 896 bus, the most practical approach may be the little-endian one. 
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Several bus standards are being developed by the 
IEEE, such as the proposed IEEE 802 standard for 
local-area networks. Other standardization effons by 
organizations such as the ISO and the IEC apply to in- 
dustrial buses and long-haul networks. Yet other stan- 



dards aim at closely coupled systems. The P896 
backplane bus, 1 a proposal being developed by the 
IEEE Computer Society, the European Workshop on 
Industrial Computing Systems, and the Institution of 
Electrical Engineers, has been designed to be the 
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Figure 1. A nonhomogeneous tightly coupled multiprocessor. 
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backbone of a tightly coupled multiprocessor system in 
which different boards communicate through a common 
memory (Figure 1). 

Standards should guarantee compatibility between 
products from different manufacturers and of different 
technical conception. Computing systems intercon- 
nected by standard buses are often, nonhomogeneous, 
and the standard has the difficult task of guaranteeing 
communication between building blocks haying dif- 
ferent speeds, signal protocols, and data formats. 

The lower protocol layers, which ensure mechanical^, 
electrical, and timing compatibility, rule how informa- 
tion is transmitted, not what information is transmitted. 
Beyond this physical compatibility, the modules must 
agree on a common format for data transmission and 
storage. Such a convention requires a much broader 
standardization than the physical layers, since it directly 
concerns the instruction set and the architecture of the 
processors. 

Until now, the manufacturers of processors have not 
agreed on any common data representation, with the 
result that it is not possible for a processor to interpret 
data in memory without knowing its source. An 
automatic translation within the bus interface is usually 
not feasible, since the interface would need an imprac- 
tical amount of knowledge about what the processor is 
intending to do. 

The problem gets worse when software layers are in- 
volved. It is reasonable to assume that programs that 
run on different processor types are compiled in- 
dependently. These programs must, however, com- 
municate with each other by means of common data 
structures such as mailboxes and ports. Unfortunately, 
compilers have little respect or knowledge of other com- 
pilers' conventions, so most of the time compatibility 
has to be achieved by rather primitive and inefficient - 
mechanisms. The data compatibility problem is present 
each time data are exchanged between machines of dif- 
ferent type or even of different serial number. It appears 
in multiprocessor computers, in networks, and each 
time a tape or a floppy disk written on one type of 
machine must be read on another. 

In this article, I analyze the problem of data represen- 
tation and show to what extent it can be solved at the 
bus interface level. I then consider how future 32-bit 
processors should be interfaced, and I give the rationale 
for the choice of the bus data format in the EDISG # 
proposal for the IEEE P896 Future Bus standard. 



Data format compatibility 

Each time data are transferred between computers of 
different types, a common data format must be 
employed. This is not easy to achieve, since the user nor- 
mally has no control over data formats, especially if she 
or he programs in a high-level language. Suppose that a 
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Little-endians vs. big-endians 

In Jonathan Swift's Gulliver's Travels, Gulliver, the 
author's hero. Is shipwrecked and washed ashore on 
LHIiput, whose six-inch-tall inhabitants are required 
by law to break their egos only at the tittle ends. Of 
course, all those citizens who habitually break their 
eggs at the big ends are angered by the proclama- 
tion. Civil war breaks out between the little-endians 
and the big-endians, resulting in the big-endians tak- 
ing refuge on a nearby island, the kingdom of 
Blefuscu. The controversy is ethically and politically 
Important for the Lilliputians— Swift has 11,000 
Lilliputian rebels die over the egg question. 

Swift was satirizing the causes of the devastating 
religious wars of his day. His point was that warring 
over matters of religious conviction is just as absurd 
as warring over egg-breaking— that everyone should 
follow his own preferred way. 

in his October 1981 article in Computer. "On Holy 
Wars and a Plea tor Peace," Danny Cohen applied 
Swift's terms to the debate over data format in the 
microcomputer world. Cohen asserted that the dif- 
ference between sending information with the little 
or big end first is indeed trivial, but that agreement 
on a single way of sending is not trivial at all. To 
avoid anarchy In the microcomputer kingdom, every- 
one must break the data egg on the same end. Which 
end Is unimportant, agreement on it is. Cohen sug- 
gested a coin toss. 

Here. Hubert Kirrmann investigates the problems 
encountered in interfacing both little-endian and big- 
endian devices to a standard microcomputer system 
bus. The bus is the proposed IEEE 896 backplane, 
and Kirrmann recommends that It be little-endian. He 
favors that approach not as a matter of technical 
conscience but as one of technical practicality. Ease 
of interfacing, low cost, and coming developments in 
the microcomputer kingdom favor the tittle end. 
Hence, the coin toss may not be the solution of 
choice for the citizens of the 896 province. 



processor is sampling data and that it sends them for 
processing to another processor. The sending program 
has been written in Pascal and defines the data to be sent 
as being of type "exchange": 



TYPE exchange = 



RECORD OF 

date: (daytime - ARRAY[L:12) 
OF CHAR); 

IO_channcl: (channel = 0 .255); 
sample: (measure = REAL) 
END; 



Even if the receiver program has also been programmed 
in Pascal and also defines the data as being of type "ex- 
change," it is unlikely that the data will be coirectly in* 
terpreted if the computers are different or even if the 
computers are the same but are running two different 
compiler versions. So an underlying common data for- 
mat must be established to allow programs to exchange 
data in a consistent way. At the end of this article, a pro- 
posal, for a standardized data format is given. Ideally, 
the agreement on the data format should be enforced by 
the compilers, not by the processor or by the user. As 
this is not the case today, the user must have a 
knowledge of the underlying data structure at least for 
all routines that communicate with the outside world. So 
we will first review the data formats used by most of to- 
day's processors. 
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Figure 2. One byte represented In little-endian notation. 
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Figure 3. 16-blt Integer formats. 



Processor fixed-data types 

In the following, we shall compare the data formats 
used by different processors. We will consider only the 
data structures recognized at the instruction set level by 
the hardware. We will distinguish between those data 
types that are public and that can be exchanged with 
other processors, like integers, reals, and pointers, and 
those that are private and are unique to a processor, like 
the instruction format and descriptors and call frames. 

Software-dictated data structures, like sets, files or 
records, and system configuration tables, are not con- 
sidered, although these data structures are already being 
cast in silicon today. We will come back later to the case 
of the software data structure. 

We shall use "byte" to mean an 6-bit unit of informa- 
tion, "word" a 16-bit unit, "triplet" a 24-bit unit, and 
"quad" a 32-bii unit. 

Byle. The only data format every microprocessor to- 
day agrees upon is that memory consists of a linear array 
of bytes. This doesn't apply to big mainframes, which 
consider memory to be an array of words, quads, or 
even rare entities like 24-bit or 36-bit chunks, or to some 
minicomputers like the PDP-8, which consider the 
memory to be an array of 12-bit units. 

Within a byte, we should give each bit a number. Here 
is where the trouble begins: Should we name DO the 
most significant bit (MSB) or the least significant bit 
(LSB)? Both conventions are meaningful. Most 
microprocessors today name DO the LSB, but the Tl 
9900 and the PDP-8, as well as most big mainframes, 
have the reverse convention and name DO the MSB or 
sign bit. The reasons (or rather unreasons) for this dif- 
ference are well explained in an article by Cohen. 2 We 
will follow Cohen's notation and refer to the first con- 
vention as "little-endian" (LE) and the second conven- 
tion as "big-endian" (BE). 

We will observe the following definitions: 

• In a little-endian format, the least significant digit 
or bit has the lowest number and is stored at the 
lowest address. 

• In a big-endian format, the most significant digit or 
bit has the lowest number and is stored at the lowest 
address. 

Interestingly enough, the big-endian/little-endian con- 
troversy takes place within the same company (DEC and 
Tl have products that employ opposing conventions) 
and even within the same processor, as we will see. The 
concept of little-endian or big-endian is thus not a prop- 
erty of a processor. A processor is LE or BE only with 
respect to certain data types. 

Since we cannot remain neutral, we adopt for count- 
ing bits the LE notation, which is the most common for 
microprocessors (Figure 2). In order to have a reference 
point, we shall try to remain little-endian in this article. 

16-blt integer (word). The LSI-! 1 , 3 8086. 4 NS 16000, 3 
iAPX 432* store the most significant part of a word 
(MSP) with the sign bit at the higher address and are, ac- 



cording to our definition, little-endians for words. The 
MC68000, 7 the Z8000. 8 and the TI 9900 9 store the MSP 
at the lowest address and are therefore big-endians for 
words. We see already that the MC68000 and Z8000 are 
inconsistent, since they are little-endians for counting 
bits but big-endians for counting bytes (Figure 3). 

32-bit Integers. Here again, the data representations 
are quite different. The 432, 8086/8087, NS 16000, and 
VAX 10 follow the little-endian philosophy, while the 
68000 goes the big-endian way, and DEC's LSI-1 1 takes 
a middle way— little-endian for words but big-endian 
for quads (Figure 4). 

Address. Processors normally consider an address to be 
a 16-bit or 32-bit integer and use the same representation 
for it as for an integer of the same size. Memory addresses 
are public data only within a tightly coupled 
multiprocessor. It makes no sense to transmit a pointer to 
data in a network; however, in a network, a device name 
can be considered an address. The address format must 
therefore also be standardized in a tightly coupled 
multiprocessor. It is relatively easy to enforce a common 
address by mapping, as long as all modules agree that the 
address space is divided into bytes. Here, the little-endian 
notation has the nice effect that it does not require that all 
lines be renamed as address size is increased. 

Floating-point formats and BCD. Although a pro- 
posal for a floating-point format has been submitted to 
the IEEE, 1 1 there is no agreement upon how the 32- or 
64-bit string representing the number should be stored, 
so everybody does it his own way (Figure 5). The only 
consistent way is Che Intel way, which stores the 
numbers in the little-endian format. DEC 2, 10 uses a mix 
between LE and BE. 



For BCD, as above, every manufacturer has his own 
format (Figure 6). 

Compound structures. Compound structures are ar- 
rays, records. Tiles, or some combination thereof. There 
are only a few hardware-defined compound structures, 
the most common being the array of characters, or 
string. 10 A BCD representation of a number is not an 
array, as long as the structure is treated as a whole by the 
machine. Fortunately, all processors agree that in an ar- 
ray structure, the first element of the array is stored at 
the lowest address. 

Some processors require character strings to have a 
length field, others require them to have a trailing 
delimiter. For these cases, the structure can be viewed as 
a record consisting of two elements, an array of 
characters, and a delimiter or length field. 

Urgency of standardization. There is no common data 
representation among processors. As processor com- 
plexity increases, more data types are put in silicon and 
the compatibility problem increases. The 8080 has only 
two hardware-defined data types (bytes and words), 
whereas the VAX has about 10. And the coprocessors 
now being put on the market are further increasing the 
number of hardware-defined data formats, and are do- 
ing so rapidly. Hence, adoption of a standard data 
representation is an urgent matter. 



Data formats on a parallel bus between a 
processor and memory 

The data format in memory does not depend solely on 
the processor. Every bus standard today imposes — un- 
necessarily as we will see— a data format for transmis- 
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sion and storage. The data format on a parallel bus 
depends on three factors: data sire, memory alignment, 
and justification. 

Data size. The bus width is the largest chunk of data 
that the bus can transmit in parallel in one operation 
(e.g.. 32 bits), and the bus data unit is the smallest chunk 
of data that the bus can transmit at a time as a unit 



(usually one 8-bit byte). In a 16-bit-wide bus, for exam- 
ple, a transfer can consist of a 16-bit word or of an 8-bit 
byte. In a 32-bit-wide bus, information can be trans- 
ferred by bytes, words, triplets, or quads. In both cases, 
the bus data unit is the byte. 

The bus width and data unit size dictate the memory 
structure. A memory has exactly one independent bank 
for each bus data unit, e.g.. one bank per byte. Any 
other arrangement would require a READ/MODIFY/ 
WRITE operation for each WRITE that is smaller than 
the bank size. Just think how a nibble (4 bits) should be 
written into a half-byte without disturbing the other 
half. 

A memory should always have the same width as the 
bus itself, since the speed gain of a wide bus would 
otherwise be offset by having to store data in con- 
secutive locations in the same chip (which requires two 
memory cycles). 

Alignment. Alignment is an additional restriction im- 
posed by a processor on the data representation in 
memory in order to simplify the interface between the 
processor, the bus, and the memory. 

In a byte-aligned memory, any data item (byte, word, 
or quad) can begin at any byte boundary, i.e., at any ad- 
dress. In a word-aligned memory, a byte can begin at 
any address but a word or a quad is constrained to begin 
at an even address (word address). In a fully-aligned 
memory, a data item can only be stored at a memory ad- 
dress which is an integer multiple of the item's length. 

Alignment is related to the width of the bus. If the bus 
is 8 bits wide, there is no reason to align data. All 8-bit 
processors are byte-aligned. Their words can begin at 
any byte address. All 16-bit processors are word-aligned 
on the bus in order to simplify memory addressing 
(Figure 7). 

If a word is aligned on an odd address, the memory 
chip address is not the same in both banks. This com- 
plicates memory control, especially if memory-board 
boundaries are crossed (i.e.. if the higher byte is on one 
board and the lower on the other). 

All existing 16-bit processors are word-aligned on the 
bus, although the NS 16000 and 8086 claim that any 
data item can be placed at any byte boundary. This is 
only true at the instruction set level, since these processors 
execute in reality two FETCH/STOREs when a 
word begins at an odd address. So instead of more com- 
plicated memory logic, a double number of memory 
cycles is used for READs as well as for WRITEs each 
time an odd-aligned word is transmitted. The speed 
penalty of this operation can be somewhat compensated 
for by instruction prefetching. We will refer to this kind 
of processor as ' •pseudo-byte-aligned. " 

Although all 8-bit microprocessors are by nature byte- 
aligned, they should respect word-alignment when 
writing into memory, or a word-aligned 16-bit processor 
will not be able to read them. The same is true for 16-bit 
processors in a 32-bit world. 

Data in a 32-bit memory can be aligned on byte or 
word boundaries (Figure 8). Thirty-two-bit buses are 
always fully aligned, since any other arrangement would 
require a complicated byte routing system such as a 4 
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x 4 crossbar matrix or a 16-byte switch. Some 32-bit 
processors like the VAX are byte-aligned, and so they re- 
quire two bus cycles for every data item that crosses a 
quad boundary. Interestingly, this occurs three-fourths 
of the time if the quads are stored randomly. 

Full alignment complicates assemblers and compilers 
and results in some loss of storage, but reduces logic com- 
plexity in both the processor and the memory and speeds 
up execution. Furthermore, the programmer need not be 
aware of memory-board boundary crossings. 

Straight (nonjustified) bus. In a straight, or non- 
justified bus. the bus lanes are direct extensions of the 
memory banks. Jn a 16-bit bus. the lanes are termed 
"odd" and "even/* In a 32-bit bus they are named 0, I, 
2, and 3, with bank 0 being accessed when the two lower 
°us of the address are 0. The data path that comes out 
of all processor chips today is straight. Straight buses in- 
clude the LSI-1 1 bus, which has a little-endian data for- 
m at. and the 16-bit Vcrsabus, which is big-endian. The 
p 896 bus is also straight and it has an uncommitted 
httle-cndian format. 

Some processors such as the MC68000 make the lane 
alignment explicit by issuing one control signal per byte 
lane, e.g., uppcr data strobc> low€r data slrooc Qther 
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Figure 6. Quad and word storage In a 32-blt memory, with affects of 
alignment shown. Bytes can be stored anywhere. 



Since straight bus lanes are just extensions or memory 
banks, a processor has full control over storage in 
memory. Hence, the bus should not prejudice the data 
format used. But, in reality, the bus of Figure 9 is a hid- 
den little-endian. For bytes to be stored at the correct 
address, the MSB line of the big-endian processor must 
be connected to the B7 bus line, and the LSB line to the 
B8 line. This could puzzle many a designer, since we 
have named the bus lines of Figure 9 using a little-endian 
notation. If the bits of a word were counted the big- 
endian way, as in the Tl 9900, DO would be the MSB and 
it would be connected to BIS, and Dl would be con- 
nected to B14, and so on. What adds to the confusion is 
that most big-endian processors such as the Z8000 use 
the little-endian convention when counting the bits of a 
word or an address. In this case, D15 of the processor 
must be connected to B7, and DO to B8. 

So, just by naming the bus lines, we have favored a 
particular data format in memory. For instance, DEC'S 
LSI- II bus is straight, but just by specifying that 

BDAL0 (address bit zero) 
= 1 selects the high byte « BDAL< 15:08 >, 



processors code this information using the lowest ad- 
dress bit(s) and size (byte/word) information, e.g., AO 
and byte/word (LSM1, 8086, and NS 16000), which 
select the correct lane. (The AO/size solution saves one 
line if address and data are multiplexed.) Figure 9 shows 
how a big-endian and a little-endian processor are con- 
nected to a straight bus. 



it declares itself a little-endian. The bus is overspecifted 
and it needlessly rules out processors of the opposite 
type. Fortunately, this is only a problem of naming. The 
designer can avoid the pitfall if he makes sure that any 
general-purpose nonjustifted bus is not named from DO 
to D15, but has an independent notation for each byte 
lane. 



Bus control signals 








The table below shows which signals the different processor types use to control the bus. These signals 
select the high or low byte of the bus. There exist two variants, the AO/byte signal, which Is used by almost 
every processor, and the UOS/LDS (upper data strobe, lower data strobe) signal, which Is used by the MC68000. 


PROCESSORS 


BYTE SELECT 


ALIGNMENT 


WORD FORMAT 


MC68000 
Z8000 
Tl 9900 
8066 

NS 16032 

LSl-11 

VAX 


UDS. LOS 
AO AN0 B/W 
NONE 

AO AND SHE 
AO AND HBE 
AO AND WTBT 
MASK <3:0> 


WORD-ALIGNED 
WORD- ALIGNED 
WORD- ALIGNED 
PSEUDO-BYTE- ALIGNED 
PSEUDO-BYTE-ALIGNED 
WORD-ALIGNED 
PSEU00- BYTE-ALIGNED 


BE 
BE 
BE 
LE 
LE 
LE 
LE 


| BUSES 


BYTE SELECT 


JUSTIFICATION 


WORD FORMAT 


S-100 
MULTIBUS 
0-BUS 
SBl 

VERSABUS. 

VME BUS 
P896, DRAFT 5-2 . 


AO AND sXTRQ 
AORO AN0 BHEN 
ADO AND WTBT 
MASK <3:0> 
A01 AND LW0RD. 

DS1 AND DS0 
COMMANDS <3:0> 


BYTE-JUSTIFIEO 
BYTE-JUSTIFIED 
NOT JUSTIFIED 
NOT JUSTIFIED 
W0R0-JUSTIFIED 

SUBSET: WORD- JUSTIFIED; 
FULL WIDTH: NOT JUSTIFIED 


8E 

LE 

BE 
LE 


KEY: 

LE » LITTLE-ENDIAN 
8E = BIG-ENDIAN 
UDS = UPPER DATA STROBE 
LDS = LOWER DATA STROBE 


SHE b BYTE HIGH ENABLE 
WTBT = WRITE BYTE 
HBE s HIGH BYTE ENABLE 
AO b A00RESS BIT ZERO 
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What happens if one tries to store the words correctly 
by inverting the byte lane connections of one of the pro- 
cessors (Figure 10)? (Forget about the byte switch BS for 
the moment.) Alas, this method only works with fully 
aligned words; an odd-aligned word will be stored dif- 
ferently by both processors— in the correct lane, but at 
the wrong address. At best, this method can be used to 
integrate aligned processors such as the MC68000 into a 
little-endian system. If the processor is fully aligned, 
however, one can even store bytes at the correct address. 
This can be done by routing a single byte to the correct 
lane with the help of the byte switch shown in Figure 10. 
To do so, one must rely on the byte/ word signal that 
most processors give to know when to swap the byte. 

Unfortunately, this can lead to trouble. First of all, 
not all processors issue byte/ word information. The TI 
9900, for instance, has no such line; it systematically 
accesses its memory as an array of words. To write a 
byte, it always does a READ/MODIFY/WRITE. The 
TI 9900 will store its words the big-endian way no matter 
what the bus format. Other processors like the LSI- 1 1/2 
do have a byte/ word line, but the information on it is 
simply ignored. Like the TI 9900, this processor always 
accesses the memory by words and does a 
READ/MODIFY/WRITE operation on the interesting 
half. Even worse, some processors like the LSI- 11/23 
issue the byte/word indication, but only for WRITEs. 
They always read a word and internally select the ac- 
cessed byte. 

Pseudo- byte-aligned processors like the 8086 issue a 
byte/word indication, but it cannot be relied on. 
44 Byte'* does not mean that a single byte has been 
transmitted — the interface cannot distinguish the 
writing of two halves of a word from the writing of two 
separate bytes. These processors will still store the odd- 
aligned words in a little-endian format and will do so in- 
dependently of the bus format. 

So the circuit of Figure 10 can only be used in some 
restricted cases; it is not recommended for general use. 
The designer should stick to the rule that the bytes 
should be at the correct place, and he should take into 
account that the words can be inconsistent. 

Justified bus. Justification, as in typography, means 
that data which arc not as wide as the bus are bound to 
the left or the right of the path. A bus is byte-justified if 
a single byte always travels on the same B7-B0 lane, a 
single word on B15-B0, and a quad on B31-B0. in a 
straight bus, however, a single byte can travel on either 
the B7-B0 lane or the B15-B8 lane, depending on 
whether it must be placed at an odd or at an even ad- 
dress. A bus is word-justified when only words are 
justified, but a byte within a word is not justified. 

Justified buses include the Multibus, 12 which is byte- 
justified and little-endian; the S-100 bus, 13 which is 
byte-justified and big-endian; and the 32-bit Ver- 
sabus/VME bus, 1415 which is word -justified and big- 
endian. Note that all these standards specify a storage 
format in memory. Justification does not imply it— a 
justified bus is in principle no different from a straight 
bus with a multiplexed data path. The same observa- 
tions as for the straight bus hold. 
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Justification requires that a single byte be recognized 
as such and be routed to the correct lane by a byte 
switch. This requirement is recognized in the Multibus, 
S-100 bus, and Versabus (Figure 11). Justification 
assumes that the processor indicates the width of its 
data; as we have seen above, this is not always the case. 
Processors such as the TI 9900 and the LSI-1 1 cannot be 
integrated into a justified bus. 

The control lines of a justified bus indicate two 
things: the start address and the size of the data. The 
availability of such information looks appealing in terms 
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of ensuring data format compatibility. One gives the 
start address of a data item and its size, and the bus in- 
terface is responsible for accessing the memory in the 
correct way. However, the data format will be defined 
only if one also indicates which half of the word is the 
MSP and which is the LSP. As shown in Figure 11, the 
justified bus respects the storage of bytes in memory, 
but handles words inconsistently. 

The above-mentioned buses impose a data format in 
memory by coupling the odd/even with the high/low- 
byte indications. Strictly following these standards, one 
should be able to achieve a common data format by con- 
necting the processors as shown in Figure 12. Alas, as 
was the case with straight buses, this method does not 
allow consistent storage of odd-aligned words. And here 
again one cannot rely on the byte/ word indication of the 
processor. For these reasons, great care must be taken 
when interfacing a processor with a justified bus of the 
opposite data format. 

There is potential trouble, for example, in connecting 
an 8086 or an NS 16000 (little-endians, pseudo-byte- 
aligned) to an S-100 bus (big-endian, justified), as 
shown in Figure 12. While aligned words will be stored 
in the correct big-endian format, words which begin at 
an odd address (and which are transmitted in two bytes) 
will be stored the little-endian way, because the interface 
is unable to distinguish the transfer of two halves of a 
word from the transfer of two single bytes: This should 
not greatly affect a big-endian processor like the 
MC68000 or the Z8000. since these processors cannot 
read odd-aligned words anyway. But the programmer 
should not access a word byte-wise, since the program 
will work differently when accessing local memory or 
global memory or when accessing an odd- or an even- 
aligned word. 
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Why justify a bus? 

Justification does not guarantee data format con- 
sistency—on the contrary, some processor types are 
ruled out when a bus is justified. So why are buses 
justified? 

Justification allows communication between bus s 
of different physical width. For Instance, the Multibus 
Is justified to let 16-blt processors communicate with 
8-blt peripherals. Justification requires a certain 
overhead In order to route data to the correct place, 
since the processor bus Is always straight. This 
overhead consists of a single byte switch In a 16-blt 
system, but four are required in a 32-bit system. Figure 
1 shows the logic required to route data tor processors 
of different data width In a little-endian system. Let us 
compare this configuration with the same configura- 
tion on a straight bus (Figure 2). Here, each module 
must have access to all byte lanes. This puts the 
overhead of Interfacing on the smaller systems. In a 
justified bus, transfers are always optimized for the 
participant with the smallest data path, e.g., for 8-blt 
devices In a byte-Justlfled bus. The burden of com- 
patibility Is put on modules with wider data paths, e.g., 
on 16-blt modules In the Multibus. 

In a mixed system comprising modules of different 
data widths, there are two reasons why one should In- 
troduce justification: 

• To minimize cost. Since the majority of modules 
are of small width, overall cost can be reduced by 
burdening compatibility on the widest modules. 
This was the case at the time the Multibus was In- 
troduced. 

• To accommodate smaller modules that do not 
have access to the whole width of the data path. 
The Versabus/VME bus Is word-justified, since 
16-blt systems do not have access to the full width 
of the 32-blt data path. (The higher data lines 
D16-D31 are on the second, facultative connector.) 

Justification Introduces some additional c n- 
straints. For Instance, It obliges large-width modules 
to always speak In the language of small-width 
modules when dealing with them. A 16-blt module in a 
Multibus system must always access an 6-blt module 
by bytes. A smarter Interface could, of course, 
automatically split a 16-bit word Into two bytes when 
accessing an 8-blt device, but to do so It would have to 
receive a reply status from the accessed module tell- 
ing It whether It was talking with an 8-blt or a 16-blt 
device. 

Another constraint Is that In a justified bus all 
transfers must be fully aligned. Doing otherwise would 
require that the Interface be capable of swapping 
data. For Instance, both halves of a word would have 
to be swapped If the word were accessed at an odd 
location (see Figure 7 In the main text of this article). 
This would cost four byte switches In a 16-blt system, 
and 16 byte switches In a 32-blt system. Fortunately, 
among processors today none send odd-aligned 
words; they use pseudo-byte alignment Instead. 



Figure 11. Interfacing a little-endian and a big-endian to a justified bus. 
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On the other hand, it is easier to put an MC68000 
or a Z8000 (big-endian) on a Multibus (justiTied little- 
endian), since these processors are fully aligned. The 
byte/word indication can then be used to indicate a 
single byte or word transfer, and the byte switch of 
Figure 12 will work. The Multibus can be made 
transparent to the processor, but the programmer must 
always access a word word -wise, not byte-wise. 

The justified standard buses mentioned above are 
overspecified. To interface a processor of one type to a 
bus of the opposite type, the designer must violate the 
standard. In general, he should always try to have the 
bytes at the correct place (Figure 11), and he can only 
achieve consistent storage for words in a few rare cases 
(Figure 12). 



We have seen that there is no common data format 
shared by the processors we have studied, except for the 
byte. Data format translation can be done by software. 
However, this translation can be quite time-consuming. 
The ideal alternative would be to let the hardware of the 
processor/bus interface do not only the physical adapta- 
tion, but also the data format translation. This is what 
has been attempted in the configurations shown in 
Figures 9 through 12. 

To do data format translation, the interface must 
know which type of data is being transmitted. Unfor- 
tunately, no current processor indicates this. A logic 
analyzer connected to the processor bus is unable to 



Should a bus be optimized for 16-bit or 
32-blt transfers? 

There is still no integrated 32-bit processor with a 
32-bit-wide data path. The data path of the processors 
that claim to be 32-bit machines is still 16 bits wide, 
and such processors can just as well be called 64-bit 
machines, since their arithmetic unit can manage 
floating-point numbers of that size. So we will classify 
processors according to the width of their data path, 
and not according to the width of their internal 
registers. In our terminology the iAPX 432 is a 16-bit 
device. 

But processors with 32-blt-wide data paths are 
bound to come. The Interesting question is how their 
data paths will be organized and how future 32-blt 
system buses will look. 

How advantageous Is a 32-blt processor? tt would 
seem that the throughput of a 32-blt processor wouid 
be twice that of a 16-blt processor, since It executes 
transfers on data twice the size of 1 6-bit data. In reali- 
ty, this Is only true If all transfers can be made 32 bits 
wide. This Is not always possible. In a 32-bit system 
like the VAX, a small percentage of transfers Is done 
on bytes (characters) and not on 32-blt entities. This 
percentage Is application-dependent. 

It Is Interesting to note that this problem most af- 
fects one application tor which 32-blt systems are cur- 
rently being developed— graphics processors. The 
screen is normally accessed as a RAM which is 16 bits 
wide (for color and symbols) and not 32 bits wide. The 
only operation in which 32-blt operations are In- 
teresting Is copying one portion of a screen to another. 
Most other operations Involve only one pixel or 
character at a time. This bus subutlllzatlon Is, 
however, secondary. What most affects throughput Is 
whether the 32-blt processor Is byte-aligned or not. If 
the 32-blt processor Is quad-aligned. I.e., if each quad 
is bound to a byte address divisible by 4, then the 
throughput will be effectively doubled. But com- 
munication is difficult in 16-blt systems, which are not 
constrained to operations on quad boundaries. It is 
especially difficult to make programs written for 16-bit 
processors compatible with a 32-blt machine. 

On the other hand, If the 32-blt processor is byte- 
aligned, I.e.. if any quad can begin on any byte ad- 
dress, then only one-fourth of all transfers will use the 
full bandwidth of the bus. Quads which are not quad- 
aligned will be fetched or written in two successive 
bus cycles, and they will account for three-lourths of 
all quad transfers (one-half of all word transfers, also). 
This is the case with the VAX, for example. The situa- 
tion can be improved by processors that use a 
prefetch for instructions and (sometimes) variables. 
With such processors, only the variables — which ac- 
count lor about half of all transfers— will be aflected 
by alignment. 

Because of these considerations, we expect a 32-bit 
system to be onty about 50 percent — not 100 per- 
cent—faster than a 16-bit system. 
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now snouia a mixeo itt/32-Dit system be optimized? 
There are two basic options: 

• optimize buses lor 16-blt-wlde transfers, and 

• optimize buses lor 32-blt-wlde transfers. 

Both options are found in system buses. The Ver- 
sabus, for instance, has 16-bit optimization, while the 
Fastbus 1 has 32 bit optimization. 

16-bli optimization. The first option means that 
buses are optimized for 16-blt processors. A processor 
with a 16-bit-wlde data path interfaces only to a 16-blt 
bus (Figure 1). The 32-blt bus Is then word-Justified, 
which has the nice side effect that a bus subset with 
less pins can be made for 16-blt processors. 

The burden of Interfacing Is put on the 32-bit 
modules. Every 32-blt module must have a word switch 
in the form of two additional 8-blt buffers. These buf- 
fers can be efficiently implemented within the pro- 



cessor Itself with no cost or delay penalty. Thlrty-two- 
blt-wlde memories must also have a word switch, 
although integrating that switch will be difficult. The 
total delay Introduced by these additional buffers is 
normally negligible (about 30 nanoseconds). 

An obvious disadvantage of this scheme is that a 
32-blt processor can only communicate with a 16-bit 
memory by accessing It word-wise; I.e.. It must know In 
advance whether It communicates with a 16- or a 32-bit 
memory. The communication protocol must ensure 
that the two participants In a data transfer always 
communicate on the level of the smallest data width. 

32 bit optimization. In a 32-bit optimized system, the 
data path on the system bus is always 32 bits wide. 
Sixteen-bit systems that Interface to this bus must 
have access to all 32 lines (Figure 2). Sixteen-bit 
devices are penatized by 16 additional bus drivers. 

All memories ought to be 32 bits wide In a 32-blt op- 
timized system. A 16-blt memory makes little sense. 



16-8IT 




PROCESSOR 





015 



32-BIT 
PROCESSOR 



M D31 



5 



ws 



\7A\ \7A \k 



BUS 

TRANSCEIVERS 



S7& 



16-BIT 
MEMORY 



WS 



32-BIT 
MEMORY 



figure 1. 32-blt bus optimized for 16-blt processors, with 16- and 32-blt devices 



attached. 



B3 
82 
B1 
B8 



BNSOOCID: <XP 6113S5A I > 



but It may be required for operations such as access- 
ing Internal registers. However, a processor needs no 
knowledge of the memory's width, since a 16-blt 
memory will appear to the bus to be a (slow) 32-bit 
memory. 

There Is no time delay Involved In a 32-blt optimized 
system. This solution requires additional power 
drivers, however. The 32-blt optimization also prevents 
the building of a 16-bit bus subset, which may be 
either an annoyance or a blessing, according to one's 
system philosophy. 

Comparison. The 16- and 32-blt optimizations have 
about the same number of advantages and- disadvan- 
tages (Table 1). We assume that future 32-blt pro- 
cessors will be tailored lor high throughput but will 
nevertheless retain 16-blt compatibility to ease their 
Introduction and lower system costs. For high 
throughput, a processor should be able to com- 
municate without overhead with a 32-blt memory. To 
do this, a 32-blt processor must be able to in- 
dependently steer each byte lane, either with a strobe 
(or a mask bit) per lane, or with a data length Indication 
(byte/word/trlplet/quad) plus the address at which the 
data should begin. (Each representation can be par- 
tially mapped onto the other, the second being 
somewhat more elegant.) 



Table 1. 

16-blt vs. 32-blt optimization lor a 32-blt bus. 



16-BIT OPTIMIZED 



32-BIT OPTIMIZED 



TIME PENALTY 
WHEN TRANS- 
MITTING 

A WORD 

A QUAD TO A 
16-BIT MODULE 



A QUAD TO A 
32-BIT MODULE 



NONE 

TWICE THE TRANSFER 
TIME BECAUSE A QUAD 
MUST BE SPUT INTO 
TWO WORDS- 



SAME AS ABOVE FOR A 
16-BIT MODULE: NONE 
FOR A 32-BIT MODULE 



NONE 

TWICE THE ACCESS 
TIME FOR 16-BIT 
MEMORIES BECAUSE OF 
STORAGE IN CON- 
TIGUOUS ADDRESSES ON 
THE SAME CHIP 

NONE 



LOGIC AMOUNT 



TWO LOW-POWER 8-BIT 
BUFFERS FOR EACH 
32-BIT MODULE 



ADDITIONAL LOGIC MAY 
BE REQUIRED FOR 
AUTOMATIC SPLITTING 
IN 32-BIT DEVICES 



TWO 8-BIT BUS DRIVERS 
FOR EACH 16-BIT PRO- 
CESSOR 

(ONE IF ADDRESS/DATA 
MULTIPLEXED) 

ADDITIONAL LOGIC 
REQUIRED FOR 16-BIT 
MODULES TO ASSEMBLE 
32-BIT DATA 



BUS SUBSET 



16-BIT SUBSET POSSIBLE 



NO 16-BIT SUBSET 
POSSIBLE 



IMPACT ON 
DATA FORMAT 



NONE 



NONE 



'Spurting Is required Anyway in 75 percent of the cues II the quads are not My aligned 



In the interest of 16-bit compatibility, a 32 bit pro- 
cessor should probably have a 16-bit mode which uses 
word justification. Its internal shifter should allow the 
processor to perform justification at no cost. Although 
high throughput requires full alignment, 16-bit com- 
patibility asks for byte alignment. Again, byte align- 
ment is cheap to achieve internally, but programmers 
should be encouraged to use lull alignment whenever 
possible. 

To be integrated into a justified bus system, a 32-bit 
processor must issue data size information, even for 
READs. In the Interest of compatibility and ease of 
program debugging, the processor should indicate on 
which type of data it Is operating. This requirement 
goes further than Just bus format compatibility and 
paves the way for a consistent system- wide object ar- 
chitecture. 

Rationale for a 32-blt bus data format. There were 
several reasons why the EDISG chose a little-endian 
representation for its proposal 2 for the IEEE PB96 
backplane bus. One was that all 16-bit BE 
microprocessors (the MC68000. 28000, and Tl 9900) are 
word-aligned. Except for the Tt 9900. which has no 
byte/word indication, these processors can be easily 
integrated into a little-endian world with a byte switch 
like that shown In Figures 10 and 12 in the main text (if 
some precautions in programming are taken), while LE 
processors are mostly pseudo-byte-aligned and can- 
not be easily fitted Into a BE system. So in order to 
maximize manufacturer independence, a little-endian 
representation was recommended. 

Another motive for the choice of a little-endian 
representation was that the Versabus and the VME 
bus are big-endian and therefore do not conform to the 
representation used In little-endian processors like the 
Intel 8086 or the NS 16000. Furthermore, the EOISG 
thought that 16-bit processors will dominate system 
design for some years to come, and therefore it recom- 
mended that the bus be 16-bit optimized. 
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distinguish among the transfer of a 16-bit integer, a 
1 6-bit address, four BCD digits, or the higher 16 bits of 
the mantissa of a floating-point number. The logic 
analyzer cannot even distinguish an instruction fetch 
from a data read in some processors like the LSI-11, 
unless it uses some tricky "manufacturer-reserved** 
lines. 

The one indication a processor gives about the data it 
processes, the byte/ word signal, can be used to convert 
the data format of words and bytes only when 

• the processor issues and itself respects the 
byte/ word indication (this is not the case for the Tl 
9900 and the LSM1); 

• the byte/word information indicates that the data 
transmitted are a single byte, and not the high or 
low part of a word (this rules out all pseudo-byte- 
aligned processors); and 

• data are always read in the same format as they are 
written (this is left to the programmer's care). 

Of all the processors we have discussed here, only the 
MC68000 and the Z8000 are suited for automatic format 
adaptation of bytes and words at the interface. With a 
tittle care in programming, the designer can quite easily 
integrate these processors into a Little-endian world, and 
the processors can consistently store at least bytes and 
words, which are the most common data types. 



All existing bus standards indirectly impose a data 
format for memory because they are overspecified. 
Hence, following the data type convention of one stan- 
dard rules out processors of the opposite type. To over- 
come this limitation and standardize ail data types, the 
interface would need to track the instruction flow of the 
processor and decide which type of data it is transmit- 
ting. The complexity of such an interface would ap- 
proach that of the processor. Furthermore, such an in- 
terface would suppose that the processor itself has some 
knowledge of the data type. However, the processor has 
such knowledge only for hardware-defined types. 

So until processors with standardized data types are 
available, the best thing is to leave the bus uncommitted 
and stick to the rule that bytes must be stored in the cor- 
rect place. This can always be done, unless one uses a 
justified bus. 

Automatic type conversion between processors hav- 
ing different data representations is restricted to very 
simple cases. It is possible to achieve partial compatibili- 
ty of data items that are not wider than the bus width if 
one relies on the byte/ word indication of some pro- 
cessors — as long as the data items are retrieved in the 
same format as they have been stored. This method is 
therefore not applicable to pseudo-byte-aligned pro- 
cessors and is currently restricted to the MC68000 and 
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the Z8000. The interface does not have sufficient 
knowledge to perform any other conversion, since the 
processor issues no information about the type of data it 
is manipulating and often has no such information. 

Since an automatic format conversion at the interface 
between processor and bus is in most cases impractical, 
standard buses like the Multibus and S-100 bus should 
not impose a storage format for memory (e.g., by cou- 
pling the "low" byte with an "even" address). The only 
practical effect of imposing a data format is to favor the 
processor for which the bus was originally designed. 

Although it is easier to only use homogeneous multi- 
processors, the simple fact that data interchange media 
are standardized encourages people to build 
heterogeneous ones. Before it is too late, a common 



data interchange format should be standardized. I 
recommend that this standard be the little-endian for- 
mat, since it is the natural way binary numbers are 
represented and since most existing processors can con- 
form to it with little expense. 

It is highly desirable that a processor in a multi- 
processor system respect full alignment; i.e., an item of 
data must always be placed at an address that is a multi- 
ple of its size. Besides ensuring compatibility, full align- 
ment speeds up execution on pseudo-byte-aligned pro- 
cessors. This is mostly a problem for compiler writers. 

It is also desirable that data be read in the same for- 
mat as it has been written. Buses should be justified only 
to connect buses of different physical widths. Justifica- 
tion does not guarantee any kind of data format com- 



A proposal for a common data representation 

This proposal for a common data format for 
multiprocessors uses a little-endian rather than a big- 
end I an representation. There Is no compelling reason 
to choose one over the other, as long as whatever 
representation Is chosen Is consistent within Itself (I.e., 
a processor should be LE or BE for all data and not 
LE for some and BE for other). One reason for choos- 
ing a little-endian format Is Its natural way of represen- 
ting numbers. 07 Is more significant than 00. Although 
we speak our numbers as BE, we add them In the LE 
way. beginning with the rightmost digit. 

Communication chips are all little-endians. 
(However, disk controllers are often big-endf ens, and 
cyclic redundancy codes, as used In protocols like 
HDLC, are also blg-endlan.) An additional argument In 
favor of llttte-endlan representetlon Is that it Is not dif- 
ficult to integrate most existing BE processors into an 
LE system (at least for bytes and words, since no BE 
processors are pseudo-byte-allgned). 

This proposal matches, to a certain extent, the 
representation used In Inters 8067 and National's NS 
16000. Except lor the floating-point format, It Is also 
compatible with DEC'S LSI-11. The lowest significant 
bit of a datum has the lowest numbering, 0. Bits within 
a datum are numbered In decimal. The higher signifi- 
cant bit appears to the left, the lower significant bit 
to the right (e.g., AD <31...0> and not 
AD<0 . . . 31 >). The following types of data are 
defined: 

• INTEGER 8: An B bit Integer, stored as a byte at 

any byte address. The least significant bit is 
number 0 and the most significant bit Is number 7. 

• CHAR_8: A character, stored In an 8-bll byte. 
ASCII formet with even or no parity Is 
recommended. 

• INTEGER 16: A 16-blt Integer, stored In two con- 
secutive byte locations. The MSB with the sign Is 
at the higher address, and the LSB in bit DO Is at 
the lowest address. This Integer can represent an 

address < unsigned lnleger> or a 2's- 

complement number <slgned_!nteger>. 

• INTEGER 32: A 32-blt integer, stored In tour con- 
secutive byte locations. The MSB with the sign Is 
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at the higher address, and the LSB in bit DO Is at 
the lowest address. This Integer can represent an 
address < tinslgned_lnteger > or a 2's- 
complement number < slgned_lnteger> . 

• INTEGER 64: A 64-bit integer, stored in lour con- 
secutive byte locations. The MSB with the sign is 
at the higher address, and the LSB in bit DO Is at 
the lowest address. 

• FLOAT 32: A 32-blt floating-point number accor- 
ding to the proposed IEEE floating-point standard. 
The sign and most significant pan of the exponent 
are at the higher address byte, and the LSB In bit 
DO is at the lower address. 

• FLOAT_64: A 64-bit floating-point number accor- 
ding to the proposed IEEE floating-point standard. 
The sign and most significant part of the exponent 
are at the higher address byte, and the LSB in bit 
DO Is at the lowest address. 

• BCD__X: A string of x BCD numbers, each filling 
a nibble (four bits). The least significant nibble is 
at the lower address, and the least significant nib- 
ble within a byte Is at D<3:0>. 

• SET_X: A string of x Boolean bits. The first ele- 
ment of the set is in bit 0. which Is at the lowest 
byte address. 

Compound data types include 

• ARRAYS: An array Is stored with its lowest 
numbering element at the lowest address. 

• RECORD: A record Is stored with the first declared 
element at the lowest address. 

• FILE: File elements are stored in the order a file 
is scanned, the first element being at the lowest 
address. 

• • TRANSMISSION ON A SERIAL MEDIUM: In a 
serial medium, the least significant bit is transmit- 
ted first. However, some arithmetic operations re- 
quire the reverse convention in order to reduce the 
logic. Examples are CRC calculation and arbitra- 
tion (comparison), in these cases, the breach in 
the convention should not appear ai the next 
higher protocol layer. 
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